.................... POLLSTER RATINGS DATASET VISUALIZATIONS ....................

-AIM- TO FIND OUT WHICH POLLSTER WILL CONDUCTS THE BEST POLL CANDIDATE WINNING PREDICTION
Importing Dependencies
In [1]:
import plotly.express as px
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import warnings
warnings.filterwarnings('ignore')
Reading data
In [2]:
df1 = pd.read_excel("POLLSTER RATINGS.xlsx")
In [3]:
df1
Out[3]:
Pollster # of Polls NCPP / AAPOR / Roper Exclusively Live Caller With Cellphones Methodology Banned by 538 Predictive Plus-Minus 538 Grade Mean-Reverted Bias Races Called Correctly ... Simple Average Error Simple Expected Error Simple Plus-Minus Advanced Plus-Minus Mean-Reverted Advanced Plus Minus Predictive Plus-Minus # of Polls for Bias Analysis Bias House Effect Year
0 Selzer & Co. 43 yes yes Live no -1.357517 A+ -0.00366667 0.86 ... 4.20000 5.6 -1.4 -1.900000 -1.1 -1.4 30 -0.00733333 0.149412 2020
1 Monmouth University 95 yes yes Live no -1.287058 A+ 1.42589 0.8 ... 5.30000 6.2 -0.8 -1.500000 -1.1 -1.3 65 2.084 -0.697921 2020
2 Field Research Corp. (Field Poll) 25 yes yes Live no -1.142149 A+ -1.21896 1 ... 3.90000 5.7 -1.8 -2.500000 -1.1 -1.1 18 -3.25056 0.614626 2020
3 ABC News/Washington Post 60 yes yes Live no -1.073369 A+ 0.613412 0.78 ... 2.90000 4.6 -1.8 -1.300000 -0.9 -1.1 55 0.948 1.48025 2020
4 Elway Research 21 yes yes Live no -1.056747 A+ 0.4736 0.9 ... 3.90000 5.9 -1.9 -2.200000 -0.9 -1.1 20 1.184 3.24362 2020
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1100 Millersville University 5 None None None None 1.553966 D- -1.49457 None ... 14.06200 None 8.87764 6.069618 None None None None None 2014
1101 Massie & Associates 2 None None None None 1.656743 D- -1.49375 None ... 23.90000 None 17.5788 15.057520 None None None None None 2014
1102 Humphrey Institute 10 yes None None None 1.897668 D- 0.798112 None ... 13.06640 None 7.42463 6.714734 None None None None None 2014
1103 Zogby Interactive/JZ Analytics 86 None None None None 2.502591 F -1.44392 None ... 6.51235 None 1.884 2.738137 None None None None None 2014
1104 TCJ Research 133 None None None yes 2.894006 F -4.50972 None ... 6.44887 None 2.09732 3.000317 None None None None None 2014

1105 rows × 21 columns

Exploratory Data Analysis
In [4]:
df1.columns
Out[4]:
Index(['Pollster', '# of Polls', 'NCPP / AAPOR / Roper',
       'Exclusively Live Caller With Cellphones', 'Methodology',
       'Banned by 538', 'Predictive    Plus-Minus', '538 Grade',
       'Mean-Reverted Bias', 'Races Called Correctly', 'Misses Outside MOE',
       'Simple Average Error', 'Simple Expected Error', 'Simple Plus-Minus',
       'Advanced Plus-Minus', 'Mean-Reverted Advanced Plus Minus',
       'Predictive Plus-Minus', '# of Polls for Bias Analysis', 'Bias',
       'House Effect', 'Year'],
      dtype='object')
In [5]:
df1.isnull().sum()
Out[5]:
Pollster                                   0
# of Polls                                 0
NCPP / AAPOR / Roper                       0
Exclusively Live Caller With Cellphones    0
Methodology                                0
Banned by 538                              0
Predictive    Plus-Minus                   0
538 Grade                                  0
Mean-Reverted Bias                         0
Races Called Correctly                     0
Misses Outside MOE                         0
Simple Average Error                       0
Simple Expected Error                      0
Simple Plus-Minus                          0
Advanced Plus-Minus                        0
Mean-Reverted Advanced Plus Minus          0
Predictive Plus-Minus                      0
# of Polls for Bias Analysis               0
Bias                                       0
House Effect                               0
Year                                       0
dtype: int64
In [6]:
df1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1105 entries, 0 to 1104
Data columns (total 21 columns):
 #   Column                                   Non-Null Count  Dtype  
---  ------                                   --------------  -----  
 0   Pollster                                 1105 non-null   object 
 1   # of Polls                               1105 non-null   int64  
 2   NCPP / AAPOR / Roper                     1105 non-null   object 
 3   Exclusively Live Caller With Cellphones  1105 non-null   object 
 4   Methodology                              1105 non-null   object 
 5   Banned by 538                            1105 non-null   object 
 6   Predictive    Plus-Minus                 1105 non-null   float64
 7   538 Grade                                1105 non-null   object 
 8   Mean-Reverted Bias                       1105 non-null   object 
 9   Races Called Correctly                   1105 non-null   object 
 10  Misses Outside MOE                       1105 non-null   object 
 11  Simple Average Error                     1105 non-null   float64
 12  Simple Expected Error                    1105 non-null   object 
 13  Simple Plus-Minus                        1105 non-null   object 
 14  Advanced Plus-Minus                      1105 non-null   float64
 15  Mean-Reverted Advanced Plus Minus        1105 non-null   object 
 16  Predictive Plus-Minus                    1105 non-null   object 
 17  # of Polls for Bias Analysis             1105 non-null   object 
 18  Bias                                     1105 non-null   object 
 19  House Effect                             1105 non-null   object 
 20  Year                                     1105 non-null   int64  
dtypes: float64(3), int64(2), object(16)
memory usage: 181.4+ KB
In [7]:
df1.nunique()
Out[7]:
Pollster                                    464
# of Polls                                  104
NCPP / AAPOR / Roper                          3
Exclusively Live Caller With Cellphones       4
Methodology                                  13
Banned by 538                                 3
Predictive    Plus-Minus                   1001
538 Grade                                    13
Mean-Reverted Bias                          919
Races Called Correctly                      131
Misses Outside MOE                           58
Simple Average Error                        736
Simple Expected Error                        58
Simple Plus-Minus                           464
Advanced Plus-Minus                         831
Mean-Reverted Advanced Plus Minus            30
Predictive Plus-Minus                        39
# of Polls for Bias Analysis                 80
Bias                                        363
House Effect                                326
Year                                          3
dtype: int64
In [8]:
df1.describe()
Out[8]:
# of Polls Predictive Plus-Minus Simple Average Error Advanced Plus-Minus Year
count 1105.000000 1105.000000 1105.000000 1105.000000 1105.000000
mean 20.907692 0.462501 6.453120 0.420432 2016.823529
std 72.642941 0.554639 4.526929 3.785362 2.506258
min 1.000000 -1.357517 0.000000 -6.939324 2014.000000
25% 2.000000 0.173766 3.845555 -1.693069 2014.000000
50% 4.000000 0.538635 5.485185 -0.164117 2016.000000
75% 11.000000 0.775159 7.700000 1.600000 2020.000000
max 777.000000 3.025044 42.940000 32.700000 2020.000000
1. Which pollster is having maximum number of polls in each year(2020,2016,2014)?
In [9]:
fig = px.pie(df1.query("Year == 2020"), values="# of Polls", names="Pollster",hover_name="Year",
             title='NUMBER OF POLLS vs POLLSTER corresponding YEAR --> 2020',hole=.5)
fig.update_traces(textposition='inside')  
fig.update_layout({'paper_bgcolor': 'rgba(222, 993, 1000, 200)'},uniformtext_minsize=10, 
                  uniformtext_mode='hide',font=dict(color="Blue"))
fig.show()
In [10]:
fig = px.pie(df1.query("Year == 2016"), values="# of Polls", names="Pollster",hover_name="Year",
             title='NUMBER OF POLLS vs POLLSTER corresponding YEAR --> 2016',hole=.5,)
fig.update_traces(textposition='inside', textfont_color="Black")  
fig.update_layout({'paper_bgcolor': 'rgb(0,0,0)'},uniformtext_minsize=10, uniformtext_mode='hide',font=dict(color="white"))
fig.show()
In [11]:
fig = px.pie(df1.query("Year == 2014"), values="# of Polls", names="Pollster",hover_name="Year",
             title='NUMBER OF POLLS vs POLLSTER corresponding YEAR --> 2014',hole=.5)
fig.update_traces(textposition='inside')  
fig.update_layout({'paper_bgcolor': 'rgb(255,255,160)'},uniformtext_minsize=10, uniformtext_mode='hide',font=dict(color="Black"))
fig.show()
In [12]:
fig = px.bar(df1.query("Year == 2020"),x="Pollster",y="# of Polls", color="Pollster",barmode="group",
             animation_frame="# of Polls",animation_group="Pollster",hover_name="Year",
             title='NUMBER OF POLLS vs POLLSTER corresponding YEAR --> 2020',text="# of Polls",range_y=[0,777],
             range_x=['Winthrop University','SurveyUSA'])
fig.update_layout({'plot_bgcolor': 'rgba(111, 882, 999, 100)','paper_bgcolor': 'rgba(222, 993, 1000, 200)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',showlegend = False,
                 xaxis={'categoryorder':'category ascending'})
fig.update_traces(texttemplate='%{text:.3s}', textposition='outside')

fig.show()
In [13]:
fig = px.bar(df1.query("Year == 2016"),x="Pollster",y="# of Polls", color="Pollster",barmode="group",
             animation_frame="# of Polls",animation_group="Pollster",hover_name="Year",
             title='NUMBER OF POLLS vs POLLSTER corresponding YEAR --> 2016',text="# of Polls",range_y=[0,777],range_x=['University of Florida','SurveyUSA'])
fig.update_layout({'plot_bgcolor': 'rgb(255,0,0)','paper_bgcolor': 'rgb(255,150,150)'},showlegend = False,
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"),
                 xaxis={'categoryorder':'category ascending'})
fig.update_traces(texttemplate='%{text:.3s}', textposition='outside')

fig.show()
In [14]:
fig = px.bar(df1.query("Year == 2014"),x="Pollster",y="# of Polls", color="Pollster",barmode="group",
             animation_frame="# of Polls",animation_group="Pollster",hover_name="Year",
             title='NUMBER OF POLLS vs POLLSTER corresponding YEAR --> 2014',text="# of Polls",range_y=[0,777],range_x=['Castleton State College','SurveyUSA'])
fig.update_layout({'plot_bgcolor': 'rgb(255,153,0)','paper_bgcolor': 'rgb(0,0,0)'},showlegend = False,
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"),
                 xaxis={'categoryorder':'category ascending'})
fig.update_traces(texttemplate='%{text:.3s}', textposition='outside')
fig.show()
OBSERVATION--> From these plottings In 2020 "Rasmussen Reports/Pulse Opinion Research" pollster got highest pollrate(711) , and 2016,2014 "SurveyUSA" got highest pollrate (763,722). But every year polling percentage gradually decreases.

---------------------------------------------------------------------------------------------------------------

2. What is "American Association for Public Opinion Research’s", "Roper Center for Public Opinion Research’s " response and polls rate?
In [15]:
fig = px.bar(df1, x="NCPP / AAPOR / Roper", y="# of Polls", color="NCPP / AAPOR / Roper",barmode="group",
             animation_frame="# of Polls",animation_group="NCPP / AAPOR / Roper",range_y=[0,777],hover_name="Year",
             title='NUMBER OF POLLS vs NCPP / AAPOR / ROPER',text="# of Polls")

fig.update_layout({'plot_bgcolor': 'rgba(111, 882, 999, 100)','paper_bgcolor': 'rgba(222, 993, 1000, 200)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',xaxis={'categoryorder':'category ascending'})

fig.update_traces(texttemplate='%{text:.3s}', textposition='outside')

fig.show()
In [16]:
fig = px.scatter(df1, x="NCPP / AAPOR / Roper", y="# of Polls", color="NCPP / AAPOR / Roper",size_max=100,
             animation_frame="# of Polls",animation_group="NCPP / AAPOR / Roper",range_y=[0,777],
                 size="# of Polls",hover_name="Year",
                title='NUMBER OF POLLS vs NCPP / AAPOR / ROPER')

fig.update_layout({'plot_bgcolor': 'rgb(255,204,153)','paper_bgcolor': 'rgb(255,102,0)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"),xaxis={'categoryorder':'category ascending'})

fig.show()
In [17]:
fig = px.histogram(df1, x="NCPP / AAPOR / Roper", y="# of Polls",color="NCPP / AAPOR / Roper",
             animation_frame="# of Polls",animation_group="NCPP / AAPOR / Roper",range_y=[0,777],hover_name="Year",
                  title='NUMBER OF POLLS vs NCPP / AAPOR / ROPER')

fig.update_layout({'plot_bgcolor': 'rgb(26,76,52)','paper_bgcolor': 'rgb(51,153,102)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"),xaxis={'categoryorder':'category ascending'})

fig.show()
In [18]:
fig = px.density_heatmap(df1, x="NCPP / AAPOR / Roper",y="# of Polls",range_y=[0,150],
                        title='NUMBER OF POLLS vs NCPP / AAPOR / ROPER')

fig.update_layout({'plot_bgcolor': 'rgb(128,0,0)','paper_bgcolor': 'rgb(128,0,0)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"),
                 xaxis={'categoryorder':'category ascending'})


fig.show()
In [19]:
from plotly import graph_objects as go
fig = go.Figure(layout={'plot_bgcolor':'skyblue','title':'NCPP / AAPOR / Roper response in corresponding YEAR'})

fig.add_trace(go.Funnel(
    name = '2020',
    orientation = "h",
    y = ["no","yes"],
    x = [339,57],marker={'color':['#DB7F93','#DB7F93']},
    textposition = "inside",
    textinfo = "percent total"))
fig.add_trace(go.Funnel(
    name = '2016',
    orientation = "h",
    y = ["no","yes"],
    x = [318,54],marker={'color':['#9BD353','#9BD353']},
    textposition = "inside",
    textinfo = "percent total"))
fig.add_trace(go.Funnel(
    name = '2014',
    orientation = "h",
    y = ["None","yes"],
    x = [283,54],marker={'color':['#A44E92','#A44E92']},
    textposition = "outside",
    textinfo = 'percent total'))
fig.update_layout(font=dict(color="black",size = 15),yaxis={'categoryorder':'category ascending'})

fig.show()
OBSERVATION --> From these plottings most of the response "NCPP / AAPOR / Roper" is 49% of "NO",in 627 polls and "American Association for Public Opinion Research’s" results not occur in "2014". In 2020 "yes" 14% - "No" 86% In 2016 "yes" 15% - "No" 85% In 2014 "yes" 16% - "No" 84%

--------------------------------------------------------------------------------------------------

3. What is "Exclusively Live Caller With Cellphones" response in which polls rate?
In [20]:
fig = px.bar(df1, x="Exclusively Live Caller With Cellphones", y="# of Polls", color="Exclusively Live Caller With Cellphones",
             barmode="group",hover_name="Year",
             animation_frame="# of Polls",animation_group="Exclusively Live Caller With Cellphones",range_y=[0,400],
            title='NUMBER OF POLLS vs EXCLUSIVELY LIVE CALLER WITH CELLPHONES',text="# of Polls")

fig.update_layout({'plot_bgcolor': 'rgb(255,204,1)','paper_bgcolor': 'rgb(255,255,1)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',
                 font=dict(color="black"),xaxis={'categoryorder':'category ascending'})

fig.update_traces(texttemplate='%{text:.3s}', textposition='outside')

fig.show()
In [21]:
fig = px.scatter(df1, x="Exclusively Live Caller With Cellphones", y="# of Polls", 
                 color="Exclusively Live Caller With Cellphones",
             animation_frame="# of Polls",size_max=100,width=1080, height=600,hover_name="Year",
                 animation_group="Exclusively Live Caller With Cellphones",range_y=[0,777],size="# of Polls",
                title='NUMBER OF POLLS vs EXCLUSIVELY LIVE CALLER WITH CELLPHONES')

fig.update_layout({'plot_bgcolor': 'rgb(204,204,255)','paper_bgcolor': 'rgb(0,0,128)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"))

fig.show()
In [22]:
fig = px.histogram(df1, x="Exclusively Live Caller With Cellphones", y="# of Polls",
                   color="Exclusively Live Caller With Cellphones",
             animation_frame="# of Polls",animation_group="Exclusively Live Caller With Cellphones",
                   range_y=[0,777],hover_name="Year",
                  title='NUMBER OF POLLS vs EXCLUSIVELY LIVE CALLER WITH CELLPHONES')

fig.update_layout({'plot_bgcolor': 'rgb(190,190,190)','paper_bgcolor': 'rgb(150,150,150)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"),
                 xaxis={'categoryorder':'category ascending'})

fig.show()
In [23]:
fig = px.density_heatmap(df1, x="Exclusively Live Caller With Cellphones",y="# of Polls",range_y=[0,150],
                        title='NUMBER OF POLLS vs EXCLUSIVELY LIVE CALLER WITH CELLPHONES')

fig.update_layout({'plot_bgcolor': 'rgb(255,0,255)','paper_bgcolor': 'rgb(255,180,204)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="black"),
                 xaxis={'categoryorder':'category ascending'})

fig.show()
In [24]:
fig = px.box(df1, x="Exclusively Live Caller With Cellphones",y="# of Polls",range_y=[0,380]
             ,color = "Exclusively Live Caller With Cellphones",
            title='NUMBER OF POLLS vs EXCLUSIVELY LIVE CALLER WITH CELLPHONES')

fig.update_layout({'plot_bgcolor': 'rgb(255,255,255)','paper_bgcolor': 'rgb(0,0,255)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"),
                  xaxis={'categoryorder':'category ascending'})

fig.show()
OBSERVATION -->From these plottings most of the response "Exclusively Live Caller With Cellphones" is 49% of "YES",in 438 polls because always conducts polls via live interviewers who place calls to cellphones in addition to landlines and it had outliers, "Exclusively Live Caller With Cellphones" results not occur in "2014".

-----------------------------------------------------------------------------------------------------------

4. which methodologies a pollster routinely uses in its election polls in campaign cycles ?
In [25]:
fig = px.bar(df1, x="Methodology", y="# of Polls", color="Methodology",barmode="group",
             animation_frame="# of Polls",animation_group="Methodology",range_y=[0,777],hover_name="Year",
            title='NUMBER OF POLLS vs METHODOLOGY',text="# of Polls")

fig.update_layout({'plot_bgcolor': 'rgb(153,20,102)','paper_bgcolor': 'rgb(153,204,0)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',
                 font=dict(color="white"),
                 xaxis={'categoryorder':'category ascending'})

fig.update_traces(texttemplate='%{text:.3s}', textposition='outside')


fig.show()
In [26]:
fig = px.scatter(df1, x="Methodology", y="# of Polls", color="Methodology",
             animation_frame="# of Polls",size_max=100,hover_name="Year",
                 animation_group="Methodology",range_y=[0,777],size="# of Polls",
                title='NUMBER OF POLLS vs METHODOLOGY')

fig.update_layout({'plot_bgcolor': 'rgb(255,255,150)','paper_bgcolor': 'rgb(255,0,0)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"))


fig.show()
In [27]:
fig = px.histogram(df1, x="Methodology", y="# of Polls",color="Methodology",
             animation_frame="# of Polls",animation_group="Methodology",range_y=[0,777],
                  title='NUMBER OF POLLS vs METHODOLOGY')

fig.update_layout({'plot_bgcolor': 'rgb(224,255,255)','paper_bgcolor': 'rgb(0,128,128)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"))

fig.show()
In [28]:
fig = px.strip(df1, x="Methodology",y="# of Polls",range_y=[-1,300],hover_name="Year",color = "Methodology",
              title='NUMBER OF POLLS vs METHODOLOGY')

fig.update_layout({'plot_bgcolor': 'rgb(255,255,255)','paper_bgcolor': 'rgb(0,255,0)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="black"),
                 xaxis={'categoryorder':'category ascending'})

fig.show()
In [29]:
fig = px.box(df1, x="Methodology",y="# of Polls",range_y=[-1,200],color="Methodology",
            title='NUMBER OF POLLS vs METHODOLOGY')

fig.update_layout({'plot_bgcolor': 'rgb(255,255,255)','paper_bgcolor': 'rgb(255,20,147)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"),
                 xaxis={'categoryorder':'category ascending'})

fig.show()
OBSERVATION -->From these plottings methodologies a pollster routinely uses in its election polls in campaign cycles is moslty "Interactive voice response" and "Live telephone interviews, including cellphones", and it had outliers, "Methodologies" survey not conducted in "2014".

---------------------------------------------------------------------------------------------------------------

5. What is polling firm usage rate from "Banned by FiveThirtyEight's election" ?
In [30]:
fig = px.bar(df1, x="Banned by 538", y="# of Polls", color="Banned by 538",barmode="stack",
             animation_frame="# of Polls",animation_group="Banned by 538",range_y=[0,777],hover_name="Year"
            ,title='NUMBER OF POLLS vs BANNED BY 538',text="# of Polls")

fig.update_layout({'plot_bgcolor': 'rgb(0,0,0)','paper_bgcolor': 'rgb(218,165,32)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',
                 font=dict(color="white"),
                 xaxis={'categoryorder':'category ascending'})

fig.update_traces(texttemplate='%{text:.3s}', textposition='outside')

fig.show()
In [31]:
fig = px.scatter(df1, x="Banned by 538", y="# of Polls", color="Banned by 538",
             animation_frame="# of Polls",size_max=100,hover_name="Year",
                 animation_group="Banned by 538",range_y=[0,777],size="# of Polls",
                title='NUMBER OF POLLS vs BANNED BY 538')

fig.update_layout({'plot_bgcolor': 'rgb(216,191,216)','paper_bgcolor': 'rgb(139,0,139)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"),
                 )

fig.show()
In [32]:
fig = px.histogram(df1, x="Banned by 538", y="# of Polls",color="Banned by 538",
             animation_frame="# of Polls",animation_group="Banned by 538",range_y=[0,777],
                  title='NUMBER OF POLLS vs BANNED BY 538')

fig.update_layout({'plot_bgcolor': 'rgb(0,255,0)','paper_bgcolor': 'rgb(255,215,0)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="black"),
                 xaxis={'categoryorder':'category ascending'})


fig.show()
In [33]:
fig = px.density_heatmap(df1, x="Banned by 538",y="# of Polls",range_y=[0,150],
                        title='NUMBER OF POLLS vs BANNED BY 538')

fig.update_layout({'plot_bgcolor': 'rgb(255,0,255)','paper_bgcolor': 'rgb(127,255,212)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="black"),
                 xaxis={'categoryorder':'category ascending'})

fig.show()
In [34]:
fig = px.strip(df1, x="Banned by 538",y="# of Polls",range_y=[-1,300],hover_name="Year",color = "Banned by 538",
              stripmode='overlay',title='NUMBER OF POLLS vs BANNED BY 538')

fig.update_layout({'plot_bgcolor': 'rgb(255,255,255)','paper_bgcolor': 'rgb(0,128,0)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"),
                 xaxis={'categoryorder':'category ascending'})

fig.show()
In [35]:
fig = px.box(df1, x="Banned by 538",y="# of Polls",range_y=[-1,135],color="Banned by 538",
            title='NUMBER OF POLLS vs BANNED BY 538')

fig.update_layout({'plot_bgcolor': 'rgb(238,232,170)','paper_bgcolor': 'rgb(189,183,107)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"),
                 xaxis={'categoryorder':'category ascending'})


fig.show()
OBSERVATION -->From these plottings most of the response from "Banned by FiveThirtyEight's election" 708 polls "NO" so pollsters follows this 538's polling firm. and "Banned by FiveThirtyEight's election" action not started in "2014".

----------------------------------------------------------------------------------------------------------

6. Which "Grade" is getting highest number of polls ?
In [36]:
fig = px.bar(df1, x="538 Grade", y="# of Polls", color="538 Grade",barmode="stack",
             animation_frame="# of Polls",animation_group="538 Grade",range_y=[0,777],hover_name="Year",
         title='NUMBER OF POLLS vs 538 GRADE',text="# of Polls")

fig.update_layout({'plot_bgcolor':'rgb(0,0,0)','paper_bgcolor':'rgb(218,165,32)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"),
                 xaxis={'categoryorder':'category ascending'})


fig.update_yaxes(showgrid=True)

fig.update_traces(texttemplate='%{text:.3s}', textposition='outside')

fig.show()
In [37]:
fig = px.scatter(df1, x="538 Grade", y="# of Polls", color="538 Grade",
             animation_frame="# of Polls",size_max=100,hover_name="Year",
                 animation_group="538 Grade",range_y=[0,777],size="# of Polls",
                title='NUMBER OF POLLS vs 538 GRADE')

fig.update_layout({'plot_bgcolor': 'rgba(111, 882, 999, 100)','paper_bgcolor': 'rgba(222, 993, 1000, 200)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="black"))
fig.show()
In [38]:
fig = px.histogram(df1, x="538 Grade", y="# of Polls",color="538 Grade",
             animation_frame="# of Polls",animation_group="538 Grade",range_y=[0,777],
                  title='NUMBER OF POLLS vs 538 GRADE')


fig.update_layout({'plot_bgcolor': 'rgb(244,164,96)','paper_bgcolor': 'rgb(245,222,179)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="black"),
                 xaxis={'categoryorder':'category ascending'})
fig.show()
In [39]:
fig = px.density_heatmap(df1, x="538 Grade",y="# of Polls",range_y=[0,150],
                        title='NUMBER OF POLLS vs 538 GRADE',hover_name = "Pollster")


fig.update_layout({'plot_bgcolor': 'rgb(255,0,255)','paper_bgcolor': 'rgb(127,255,212)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="black"),
                 xaxis={'categoryorder':'category ascending'})

fig.show()
In [40]:
from plotly import graph_objects as go
fig = go.Figure(layout={'plot_bgcolor':'skyblue','title':'NUMBER OF POLLS vs 538 GRADE'})

fig.add_trace(go.Funnel(
    name = "'C' Grade",
    orientation = "h",
    y = ["C","C+","C-"],
    x = [1193,669,433],marker={'color':['#FF1493','#FFB6C1','#FAEBD7']},
    textposition = "inside",
    textinfo = "value"))

fig.add_trace(go.Funnel(
    name = "'B' Grade",
    orientation = "h",
    y = ["B+","B","B-"],
    x = [794,481,367],marker={'color':['#228B22','#32CD32','#98FB98']},
    textposition = "inside",
    textinfo = "value"))

fig.add_trace(go.Funnel(
    name = "'A' Grade",
    orientation = "h",
    y = ["A","A-","A+"],
    x = [797,332,72],marker={'color':['#B8860B','#DAA520','#EEE8AA']},
    textposition = "inside",
    textinfo = 'value'))

fig.add_trace(go.Funnel(
    name = "'D' Grade",
    orientation = "h",
    y = ["D+","D","D-"],
    x = [142,104,10],marker={'color':['#00CED1','#00FFFF','#AFEEEE']},
    textposition = "inside",
    textinfo = 'value'))

fig.add_trace(go.Funnel(
    name = "'F' Grade",
    orientation = "h",
    y = ["F"],
    x = [219],marker={'color':['#FF0000']},
    textposition = "inside",
    textinfo = 'value'))


fig.update_layout(height=650,width=1000,font=dict(color="black",size = 15))

fig.show()
In [41]:
fig = px.box(df1, x="538 Grade",y="# of Polls",range_y=[-1,150],color = "538 Grade",
            title='NUMBER OF POLLS vs 538 GRADE')

fig.update_layout({'plot_bgcolor': 'rgb(0,0,0)','paper_bgcolor': 'rgb(192,192,192)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="black"),
                 xaxis={'categoryorder':'category ascending'})

fig.show()
In [42]:
fig = px.bar(df1, x="538 Grade", y="# of Polls",hover_name="Year",color = "Year",
            range_y=[0,3500])

fig.update_layout({'plot_bgcolor': 'rgba(111, 882, 999, 100)','paper_bgcolor': 'rgba(222, 993, 1000, 200)'}
                 ,font=dict(color="black"),yaxis=dict(showticklabels=False))

fig.show()
OBSERVATION --> From these plottings "A Grade" is got highest number of polls (777) in Year = 2020. but most of the pollsters are got "C+ Grade"

---------------------------------------------------------------------------------------------------------------

7. How many polls are occured by "Year" wise response in "538 Grade","NCPP / AAPOR / Roper","Exclusively Live Caller With Cellphones","Methodology","Banned by 538"
In [43]:
sns.set(rc={'axes.facecolor':'w', 'figure.facecolor':'gold'})

fig, axes = plt.subplots(5,figsize=(18,25))

sns.barplot(x=df1["538 Grade"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[0],palette="Set1")
sns.barplot(x=df1["NCPP / AAPOR / Roper"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[1],palette="Set1")
sns.barplot(x=df1["Exclusively Live Caller With Cellphones"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[2],palette="Set1")
sns.barplot(x=df1["Methodology"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[3],palette="Set1")
sns.barplot(x=df1["Banned by 538"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[4],palette="Set1")
plt.show()
In [44]:
sns.set(rc={'axes.facecolor':'w', 'figure.facecolor':'lightgreen'})

fig, axes = plt.subplots(5,figsize=(18,25))

sns.lineplot(x=df1["538 Grade"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[0],palette="Set1",style = df1["Year"],
             markers=True, dashes=False, lw=1)
sns.lineplot(x=df1["NCPP / AAPOR / Roper"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[1],palette="Set1",style = df1["Year"]
            ,markers=True, dashes=False, lw=3)
sns.lineplot(x=df1["Exclusively Live Caller With Cellphones"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[2],palette="Set1",
             style = df1["Year"],markers=True, dashes=False, lw=3)
sns.lineplot(x=df1["Methodology"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[3],palette="Set1",style = df1["Year"],
            markers=True, dashes=False, lw=3)
sns.lineplot(x=df1["Banned by 538"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[4],palette="Set1",style = df1["Year"],
            markers=True, dashes=False, lw=3)
plt.show()
In [45]:
sns.set(rc={'axes.facecolor':'w', 'figure.facecolor':'lightpink'})

fig, axes = plt.subplots(5,figsize=(18,25))


sns.pointplot(x=df1["538 Grade"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[0],palette="Set1",
            ci=25,capsize=.1,markers = '^')
sns.pointplot(x=df1["NCPP / AAPOR / Roper"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[1],palette="Set1",
             ci=25,capsize=.1,markers = '*')
sns.pointplot(x=df1["Exclusively Live Caller With Cellphones"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[2],palette="Set1",
             ci=25,capsize=.1,markers = 'D')
sns.pointplot(x=df1["Methodology"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[3],palette="Set1",
             ci=25,capsize=.1,markers = 's')
sns.pointplot(x=df1["Banned by 538"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[4],palette="Set1",
             ci=25,capsize=.1,markers = '+')
plt.show()
In [46]:
sns.set(rc={'axes.facecolor':'w', 'figure.facecolor':'gold'})

fig, axes = plt.subplots(5,figsize=(18,25))

sns.boxenplot(x=df1["538 Grade"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[0],palette="Set1")
sns.boxenplot(x=df1["NCPP / AAPOR / Roper"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[1],palette="Set1")
sns.boxenplot(x=df1["Exclusively Live Caller With Cellphones"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[2],palette="Set1")
sns.boxenplot(x=df1["Methodology"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[3],palette="Set1")
sns.boxenplot(x=df1["Banned by 538"],y=df1["# of Polls"],hue=df1["Year"],ax = axes[4],palette="Set1")
plt.show()
OBSERVATION -->From these plottings In 538 Grade "A and F" got highest no' of polls in "2014" arround '150' and it had outliers. In NCPP/AAPOR/Roper(Public endorsement)"Yes" had "45 polls" in "2020",42 polls in 2016,40 polls in 2014 In Exclusively Live Caller With Cellphones "150 polls" occured in 2014 that is "Sometimes". In Methodology "IVR,LIVE,ONLINE" got arround "250 polls" only in year 2020. In Banned by 538 "Yes" got "150 polls in 2014","100 polls in 2016" and "80 polls in 2014"

-------------------------------------------------------------------------------------------------------------

8. which "pollster" is most "biased" in which "Year" ?
In [47]:
fig = px.bar(df1,x="Pollster",y ="# of Polls for Bias Analysis",color = "Pollster",
            animation_frame="# of Polls for Bias Analysis",animation_group="Pollster",hover_name="Year",
            title='POLLS FOR BIAS ANALYSIS vs POLLSTER corresponding YEAR',text="# of Polls for Bias Analysis",
            range_y=[0,652])

fig.update_layout({'plot_bgcolor': 'rgb(0,255,255)','paper_bgcolor': 'rgb(245,222,179)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="black"),
                 xaxis={'categoryorder':'category ascending'})
fig.show()
In [48]:
fig = px.pie(df1.query("Year == 2020"), values='# of Polls for Bias Analysis', names='Pollster',hover_name="Year",
            hole=.5,title='POLLS FOR BIAS ANALYSIS vs POLLSTER corresponding YEAR --> 2020')
fig.update_traces(textposition='inside')

fig.update_layout({'paper_bgcolor': 'rgb(0,255,0)'},uniformtext_minsize=10, 
                  uniformtext_mode='hide',font=dict(color="Black"))
fig.show()
In [49]:
fig = px.pie(df1.query("Year == 2016"), values='# of Polls for Bias Analysis', names='Pollster',hover_name="Year",
            hole=.5,title='POLLS FOR BIAS ANALYSIS vs POLLSTER corresponding YEAR --> 2016')
fig.update_traces(textposition='inside')

fig.update_layout({'paper_bgcolor': 'rgb(192,192,192)'},uniformtext_minsize=10, 
                  uniformtext_mode='hide',font=dict(color="white"))
fig.show()
OBSERVATION --> From these plottings in Bias Analysis the most biased pollster is "YouGov" in "2016" polls "652" and "Bias Analysis" not done in "2014".

---------------------------------------------------------------------------------------------------

9. What is range between AVERAGE & EXPECTED error ?
In [50]:
fig = px.density_heatmap(df1, x="Simple Average Error", y="Simple Expected Error",
marginal_x="box", marginal_y="histogram",range_x=[0,25],hover_data=['Year','Pollster'],
                        title="AVERAGE ERROR & EXPECTED ERROR in YEAR")


fig.update_layout({'paper_bgcolor': 'rgb(0,255,255)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="black"),
                 xaxis={'categoryorder':'category ascending'},width = 1000)

fig.show()
In [51]:
fig = px.scatter(df1.query("Year != 2014"), x="Simple Average Error", y="Simple Expected Error",
                title="AVERAGE ERROR & EXPECTED ERROR in YEAR",hover_data=["Year","Pollster"],
                marginal_x="box", marginal_y="histogram",range_x=[0,25],range_y=[2,10])

fig.update_traces(marker=dict(size=12,
                              line=dict(width=2,
                                        color='DarkSlateGrey')),
                  selector=dict(mode='markers'))
fig.update_layout({'paper_bgcolor': 'rgb(255,0,0)'},
                 uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"),
                 xaxis={'categoryorder':'category ascending'},width = 950,height = 500)

fig.show()
OBSERVATION --> From these plottings here maximum density of error occured between 4.0 - 6.0 so the margin difference between firm's average error & the expected error for the races the firm surveyed is "1"

-----------------------------------------------------------------------------------------------------

10. What is year wise response in "PREDICTIVE & ADVANCED PLUS MINUS" ?
In [52]:
fig = px.histogram(df1, x="Predictive    Plus-Minus", y="Advanced Plus-Minus", color="Year", 
                   marginal="box",title="PREDICTIVE & ADVANCED PLUS MINUS in YEAR")

fig.update_layout({'plot_bgcolor': 'rgb(224,255,255)','paper_bgcolor': 'rgb(255,215,0)'},
                 uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="black"),
                 xaxis={'categoryorder':'category ascending'},width = 1000,height = 600)
fig.show()
In [53]:
fig = px.density_contour(df1, x="Predictive    Plus-Minus", y="Advanced Plus-Minus", color="Year",
                  marginal_x="box", marginal_y="histogram",
                  range_x=[-1,1.5],range_y = [-10,10],
                  title="PREDICTIVE & ADVANCED PLUS MINUS in YEAR")

fig.update_layout({'plot_bgcolor': 'rgb(255,255,255)','paper_bgcolor': 'rgb(255,20,147)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"),
                 xaxis={'categoryorder':'category ascending'},width = 1000)

fig.show()
In [54]:
fig = px.density_heatmap(df1, x="Predictive    Plus-Minus", y="Advanced Plus-Minus",
                  marginal_x="box", marginal_y="histogram",color_continuous_scale=px.colors.sequential.Cividis_r,
                  range_x=[-1,1.5],range_y = [-10,10],hover_data=["Year","# of Polls","Methodology"],
                       title="PREDICTIVE & ADVANCED PLUS MINUS in YEAR" )

fig.update_layout({'plot_bgcolor': 'rgb(128,0,128)','paper_bgcolor': 'rgb(0,128,0)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"),
                 xaxis={'categoryorder':'category ascending'},width = 980)
fig.show()
In [55]:
sns.set(rc={'axes.facecolor':'cyan', 'figure.facecolor':'khaki'})
fig, axes = plt.subplots(1,2,figsize=(20 ,8))

sns.boxplot(x=df1["Year"],y=df1["Predictive    Plus-Minus"],ax = axes[0],palette="Set1")
sns.boxplot(x=df1["Year"],y=df1["Advanced Plus-Minus"],ax = axes[1],palette="Set1")
Out[55]:
<matplotlib.axes._subplots.AxesSubplot at 0x23c9ba29700>
OBSERVATION --> From these plottings the maximum density of plus minus values occuring range is (0-2), Negative scores are favorable and indicate above-average quality.(ie) "Methodology"--> "Live", and Advanced & Predictive Plus-Minus scores had outliers, in 2014 both Advanced & Predictive Plus-Minus scores have high methodological quality

----------------------------------------------------------------------------------------------

11. What is year wise response in "SIMPLE & Mean-Reverted ADVANCED PLUS MINUS" ?
In [56]:
fig = px.histogram(df1.query("Year != 2016"), x="Simple Plus-Minus",y = "Mean-Reverted Advanced Plus Minus", 
                   color="Year", marginal="box", 
                   hover_data=["Year","Advanced Plus-Minus","Mean-Reverted Advanced Plus Minus","Predictive Plus-Minus"],
                  range_x=[-10,20],
                 title="SIMPLE & MEAN REVERTED ADVANCED PLUS MINUS in YEAR" ,
                  color_discrete_sequence= px.colors.sequential.Plasma_r,)

fig.update_layout({'plot_bgcolor': 'rgb(0,0,0)','paper_bgcolor': 'rgb(0,250,154)'},
                 uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="black"),
                 xaxis={'categoryorder':'category ascending'},width = 980,height = 600)

fig.show()
In [57]:
fig = px.scatter(df1.query("Year != 2016"), y="Simple Plus-Minus",x = "Mean-Reverted Advanced Plus Minus",
           size="Year", size_max=25,color = "Year",marginal_x="box",marginal_y="violin",
                 hover_data=["Year","Advanced Plus-Minus","Mean-Reverted Advanced Plus Minus","Predictive Plus-Minus"],
         color_discrete_sequence= px.colors.sequential.Tealgrn_r,
                title="SIMPLE & MEAN REVERTED ADVANCED PLUS MINUS in YEAR")

fig.update_traces(marker=dict(size=22,
                              line=dict(width=2,
                                        color='DarkSlateGrey')),
                  selector=dict(mode='markers'))

fig.update_layout({'plot_bgcolor': 'rgb(245,255,250)','paper_bgcolor': 'rgb(128,0,0)'},
                 uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="White"),
                 xaxis={'categoryorder':'category ascending'},width = 1080,height = 600)

fig.show()
OBSERVATION -->From these plottings the density of values occuring in betweeen (-10 to 15) & (-1 to 1) in 2020, negative scores are favorable and indicate above-average quality. and both "SIMPLE & MEAN REVERTED ADVANCED PLUS MINUS" having outliers in 2020,2014, "SIMPLE & MEAN REVERTED ADVANCED PLUS MINUS scores" results not occur in "2016".

----------------------------------------------------------------------------------------------------

12. Which pollster is having maximum and minimum Republican,Democratic scores in year 2020( BIAS & HOUSE EFFECT )?
In [58]:
fig = px.density_contour(df1.query("Year == 2020"),x = "Bias",y = "House Effect",range_x=[-10,10],
                        animation_frame="Pollster",range_y=[-2.5,5.0],hover_data=["Bias","House Effect"],
                        title="BIAS & HOUSE EFFECT in YEAR --> 2020",
                        color="Pollster")
fig.update_traces(line_width=2)
fig.update_layout({'plot_bgcolor': 'rgb(255,255,255)','paper_bgcolor': 'rgb(255,69,0)'},
                 uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"),
                 xaxis={'categoryorder':'category ascending'},width = 980)
    

fig.show()
In [59]:
fig = px.density_heatmap(df1.query("Year == 2020"),x = "Bias",y = "House Effect",range_x=[-25,20],
                          range_y=[-20,20],hover_data=["Bias","House Effect"],
                        color_continuous_scale=px.colors.sequential.Viridis_r,
                       title="BIAS & HOUSE EFFECT in YEAR --> 2020" )

fig.update_layout({'paper_bgcolor': 'rgb(255,20,147)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="white"),width = 980)

fig.show()
In [60]:
fig = px.histogram(df1.query("Year == 2020"),x = "Bias",y = "House Effect",color="Year",
                   marginal="box",color_discrete_sequence=px.colors.sequential.Cividis_r,range_x=[-25,25],
                  title="BIAS & HOUSE EFFECT in YEAR --> 2020")

fig.update_layout({'plot_bgcolor': 'rgb(0,0,0)','paper_bgcolor': 'rgb(0,250,154)'},showlegend=False,
                 uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="black"),
                 xaxis={'categoryorder':'category ascending'},width = 980,height = 600,
                )

fig.show()
OBSERVATION From these plottings BIAS in "2020": Republican score is "-20" ,pollster is "Massive & Associates" Democratic score is "40",pollster is "Jayhawk consulting" HOUSE EFFECT in "2020": Republican score is "-25" ,pollster is "Fort Hays State University" Democratic score is "30",pollster is "Riggs Research Services"

-------------------------------------------------------------------------------------------------

13. What is the "MARGIN OF ERROR" response in year(2020) of pollsters?
In [61]:
fig = px.scatter(df1.query("Year == 2020"),x="Pollster",y="Misses Outside MOE",color="Pollster",
                marginal_x="rug",marginal_y = "histogram",
                title = "MISSES OUTSIDE MOE vs POLLSTER in YEAR --> 2020")

fig.update_traces(marker=dict(size=25,line=dict(width=1,color='DarkSlateGrey')),
                  selector=dict(mode='markers'))

fig.update_layout({'plot_bgcolor': 'rgb(245,255,250)','paper_bgcolor': 'rgb(255,215,0)'}
                 ,font=dict(color="black"),xaxis=dict(showticklabels=False)
                 ,width = 980,height = 550,showlegend = False)
fig.show()
In [62]:
fig = px.bar(df1.query("Year == 2020"),x="Pollster",y="Misses Outside MOE", color="Pollster",barmode="group",
             animation_frame="Misses Outside MOE",animation_group="Pollster",hover_name="Year",range_y=[0,1],
             title='MISSES OUTSIDE MOE vs POLLSTER in YEAR --> 2020',text="Misses Outside MOE")

fig.update_layout({'plot_bgcolor': 'rgb(238,232,170)','paper_bgcolor': 'rgb(218,165,32)'},
                  uniformtext_minsize=8, uniformtext_mode='hide',font=dict(color="black"),
                 xaxis={'categoryorder':'category ascending'})

fig.update_traces(texttemplate='%{text:1%s}', textposition='outside')

fig.show()
OBSERVATION --> From these plottings most of the pollsters got (0 - 50%) error. And this poll's margin of error percentage is in increasing order.

------------------------------------------------------------------------------------------------------------

-CONCLUSION- FROM ALL THESE VISUALIZATIONS WE DECLARE THE POLLSTER IS "Rasmussen Reports/Pulse Opinion Research" ORAGANISATION HAD HIGHEST POLL RATING SO IT WILL GIVES THE BEST POLL CANDIDATE WINNING PREIDICTION.
In [ ]: